Stemming Strategies for European Languages

نویسنده

  • Jacques Savoy
چکیده

In this paper, we describe and evaluate different general stemming approaches for the French, Portuguese (Brazilian), German and Hungarian languages. Based on the CLEF test-collections, we demonstrate that light stemming approaches are quite effective for the French, Portuguese and Hungarian languages, and perform reasonably well for the German language. Variations in mean average precision among the different stemmers are also evaluated and are sometimes found to be statistically significant.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monolingual Document Retrieval: English versus other European Languages

The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming an...

متن کامل

Lexical and Algorithmic Stemming Compared for 9 European Languages with Hummingbird SearchServerTM at CLEF 2003

Hummingbird participated in the monolingual information retrieval tasks of the Cross-Language Evaluation Forum (CLEF) 2003: for natural language queries in 9 European languages (German, French, Italian, Spanish, Dutch, Finnish, Swedish, Russian and English) find all the relevant documents (with high precision) in the CLEF 2003 document sets. For each language, SearchServer scored higher than th...

متن کامل

Data Fusion for Effective European Monolingual Information Retrieval

For our fourth participation in the CLEF evaluation campaigns, our first objective was to propose an effective and general stopword list and a light stemming procedure for the Portuguese language. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in the Finnish and Russian languages. Finally, based on the Z-score method...

متن کامل

Terrier takes on the non-English Web

The aim of this work is to identify how standard Information Retrieval (IR) techniques can be adapted in Web retrieval for non-English queries. In particular, we address the challenge of stemming queries and documents in a multilingual setting. Experiments with a multilingual collection of over 20 languages, more than 800 queries, and various stemming strategies in these languages reveal that u...

متن کامل

Stemming Approaches for East European Languages

During this CLEF evaluation campaign, the first objective is to propose and evaluate various indexing and search strategies for the Czech language that will hopefully result in more effective retrieval than language-independent approaches (n-gram). Based on the stemming strategy we developed for other languages, we propose that for the Slavic language a light stemmer (inflectional only) and als...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010